182 research outputs found
True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4
Large language models (LLMs) have demonstrated solid zero-shot reasoning
capabilities, which is reflected in their performance on the current test
tasks. This calls for a more challenging benchmark requiring highly advanced
reasoning ability to be solved. In this paper, we introduce such a benchmark,
consisting of 191 long-form (1200 words on average) mystery narratives
constructed as detective puzzles. Puzzles are sourced from the "5 Minute
Mystery" platform and include a multiple-choice question for evaluation. Only
47% of humans solve a puzzle successfully on average, while the best human
solvers achieve over 80% success rate. We show that GPT-3 models barely
outperform random on this benchmark (with 28% accuracy) while state-of-the-art
GPT-4 solves only 38% of puzzles. This indicates that there is still a
significant gap in the deep reasoning abilities of LLMs and humans and
highlights the need for further research in this area. Our work introduces a
challenging benchmark for future studies on reasoning in language models and
contributes to a better understanding of the limits of LLMs' abilities.Comment: 5 pages, to appear at *SE
Multi-Domain Neural Machine Translation
We present an approach to neural machine translation (NMT) that supports
multiple domains in a single model and allows switching between the domains
when translating. The core idea is to treat text domains as distinct languages
and use multilingual NMT methods to create multi-domain translation systems, we
show that this approach results in significant translation quality gains over
fine-tuning. We also explore whether the knowledge of pre-specified text
domains is necessary, turns out that it is after all, but also that when it is
not known quite high translation quality can be reached.Comment: Accepted to EAMT'2018, In Proceedings of the 21st Annual Conference
of the European Association for Machine Translation (EAMT'2018
Voting and Stacking in Data-Driven Dependency Parsing
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 219-222.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
DNA Repair Proteins as Molecular Targets for Cancer Therapeutics
Cancer therapeutics include an ever-increasing array of tools at the disposal of clinicians in their treatment of this disease. However, cancer is a tough opponent in this battle and current treatments which typically include radiotherapy, chemotherapy and surgery are not often enough to rid the patient of his or her cancer. Cancer cells can become resistant to the treatments directed at them and overcoming this drug resistance is an important research focus. Additionally, increasing discussion and research is centering on targeted and individualized therapy. While a number of approaches have undergone intensive and close scrutiny as potential approaches to treat and kill cancer (signaling pathways, multidrug resistance, cell cycle checkpoints, anti-angiogenesis, etc.), much less work has focused on blocking the ability of a cancer cell to recognize and repair the damaged DNA which primarily results from the front line cancer treatments; chemotherapy and radiation. More recent studies on a number of DNA repair targets have produced proof-of-concept results showing that selective targeting of these DNA repair enzymes has the potential to enhance and augment the currently used chemotherapeutic agents and radiation as well as overcoming drug resistance. Some of the targets identified result in the development of effective single-agent anti-tumor molecules. While it is inherently convoluted to think that inhibiting DNA repair processes would be a likely approach to kill cancer cells, careful identification of specific DNA repair proteins is increasingly appearing to be a viable approach in the cancer therapeutic cache
Mixing and blending syntactic and semantic dependencies
Our system for the CoNLL 2008 shared
task uses a set of individual parsers, a set of
stand-alone semantic role labellers, and a
joint system for parsing and semantic role
labelling, all blended together. The system
achieved a macro averaged labelled F1-
score of 79.79 (WSJ 80.92, Brown 70.49)
for the overall task. The labelled attachment
score for syntactic dependencies was
86.63 (WSJ 87.36, Brown 80.77) and the
labelled F1-score for semantic dependencies
was 72.94 (WSJ 74.47, Brown 60.18)
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
- …